<!--HOW TO COMPLETE THIS FORM:-->

<!--
1. Checkboxes in this document appear as follows: 

- [ ] This is a checkbox 

To check a checkbox, replace [ ] by [x], as follows: 

- [x] This is a checked checkbox 

Note that current versions of RStudio for Mac (this will change with RStudio versions 1.3 and higher) will not create a formatted checkbox but will leave the original characters, i.e., literally "[ ]" or "[x]". It's fine to submit a PDF in this form.
 
2. For text answers, simply type the relevant text in the areas indicated. A blank line starts a new paragraph. 
 
3. Comments (like these instructions) provide additional instructions throughout the form. There is no need to remove them; they will not appear in the compiled document. 

4. If you are comfortable with Markdown syntax, you may choose to include any Markdown-compliant formatting in the form. For example, you may wish to include R code chunks and compile this document in R Markdown.
-->

<!-- This form documents the artifacts associated with the article (i.e., the data and code supporting the computational findings) and describes how to reproduce the findings. -->

# Closed testing with Globaltest, with application in metabolomics

## Data

### Abstract
We develop a multiple testing procedure that corrects for multiple feature set testing , with family-wise error rate control. Our core application is on pathway databases, but not limited to pathways. We can also correct for other functional annotation databases, such as interections with proteins or biological roles (e.g. biofunctions). 

We focus on the data with binary response, including the simulated data and real data from metabolic experiment. All real data used in this work are publicaly available. 
<!--
Provide a short (< 100 words), high-level description of the data
-->

### Availability

#### Real datasets

- [x] Data set "Eisner" is available online at MetaboAnalyst: https://www.metaboanalyst.ca/resources/data/human_cachexia.csv

- [x] Data set "Bordbar" is downloaded from Metabolights: https://www.ebi.ac.uk/metabolights/MTBLS23/files

- [x] Data set "Taware" is downloaded from Metabolights: https://www.ebi.ac.uk/metabolights/MTBLS760/files

- [x] Data set "Al-Mutawa" is downloaded from Metabolights: https://www.ebi.ac.uk/metabolights/MTBLS541/files

#### Annotation databases

- [x] Pathway databases "KEGG", "Biocyc", "SMPDB" and other annotation databases "HMDB bifunctions", "Protein interactions (HMDB)" are downloaded from MBROLE: http://csbg.cnb.csic.es/mbrole2/analysis.php

- [x] Wikipathways are obtained from R package "rWikipathways" with DOI: 10.18129/B9.bioc.rWikiPathways


#### Simulated data 
Simulated data used in this work includes two parts:

- [x] For the motivating example in the introduction

- [x] For the recursive toy examples 

- [x] For the computing time comparisons

- [x] For the comparisons between CTGT and CTST


#### Data format

We have cleaned the data sets by removing missing values and filtering low-expressed metabolites (with >=50% zeros). All processed real data sets and pathway databases mentioned above are saved as .RData files and users can directly load them for further research and reproducible research. 


## Code

This folder contains all necessary files to reproduce the results in tables and figures. The contents of the fold include 3 subfolders:

### 1. example

(a) gmincmax.R and gmincmax_plot.R: code for generating the simulated data and saving the necessary results, which are used for generating Figure 1, Figure 2.

(b) BAB_plot.R: code for generating Figure 3, illustrating the branch and bound algorithm.

(c) algorithm1_supp.R and ./algorithm1_supp_plot.R : code for illustrating Algorithm 1 in the supplement.

### 2. realdata

(d) dataandpathway_eisner.R, dataandpathway_bordbar.R, dataandpathway_taware.R and dataandpathway_almutawa.R: functions and codes to clean the realdata sets and obtain the annotation databases, which are save as .RData files in sudfolders Eisner, Bordbar, Taware and Al-Mutawa respectively.

(e) FWER_excess.R: functions to show the inflation of FWER when integrating multiple annotation databases and to generate Table 1.

(f) data_info.R: code to obtain Table 2 and Table 3. 

(g) rejections_perdataset.R and triangulartable.R: functions to generate Table 4. The runing time of rejections_perdataset.R is 7 hours around, we thus save the corresponding results as res_eisner.RData, res_bordbar.RData, res_taware.RData and res_almutawa.RData in subfolders Eisner, Bordbar, Taware and Al-Mutawa respectively. 

(h) pathwaysizerank.R: code for generating Figure 4 based on res_eisner.RData from (h).

(i) iterationandtime_plot.R: code for generating Figure 5 based on "Al-Mutawa" data. The code is really time-consuming, nearly 5 days, we thus save the corresponding results and plot them in the main manuscript by pgfplot.

### 3. alpha0

(j) alpha0_calculation.R and alpha0_plot.R: functions and codes for clarifying the assumption on alpha0, i.e. alpha0 >= 5%. We have shown in the supplementary material that alpha0 is around 30%. The result is based on Eisner data but similar patterns can be found in other data sets.

### 4. simulation_study

(k) run_sim.R and tables.R: functions and codes for generating artificial data to compare CTGT with CTST. The runing time of run_sim.R is 4 hours around, we thus save the results in gt_mn0_100.RData, ..., st_mn9_150.RData.

### 5. simultation_supportInfo

(l) time_ct.R, time_shortcut.R and time_plot.R: functions and codes for counting computing time of full closed testing and shortcut procedure. We have save the results in ACT.RData and CT.RData, since they took long time to run. The plot is shown in the support information. 

### Additional information

To run the code, the user will need to open the R project within each subfolder:

(1) ./example/example.Rproj

(2) ./realdata/realdata.Rproj

(3) ./alpha0/alpha0.Rproj

(2) ./simulation_study/simulations.Rproj

(3) ./simultation_supportInfo/simulation.Rproj

